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DETAILED ACTION 

1 . This communication is in response to the Arguments filed on 02/25/2009. Claims 
1-33 remain pending and have been examined. The Applicants' amendment and 
remarks have been carefully considered, but they are not persuasive and do not place 
the claims in condition for allowance. Accordingly, this Action has been made FINAL. 

2. All previous objections and rejections directed to the Applicant's disclosure and 
claims not discussed in this Office Action have been withdrawn by the Examiner. 



Response to Arguments 

3. Applicant's arguments filed on 01/21/2009 (pages 2-9) have been fully 
considered but they are not persuasive for the reasons discussed below. 

In response to the Applicant's first argument, regarding the Applicant providing 
explicit definition for the term "runs" and cites paragraph [0037] of the published 
specification showing Nagasaki not teaching such "runs", the Examiner respectfully 
disagrees. The definition of the term "run" as provided by the Applicant defines it to 
mean a sub-sequence of consecutive identical items in a sequence, where the sub- 
sequence can be or more as is well known in the art. The examples provided by the 
Applicant are not explicit since they are mere examples of the defined term "run". 
Nagasaki does teach determination of a number of runs, where in Nagasaki col. 7, lines 
10-16, such runs comprise energy values for an analysis region in a frame (4 values are 
determined for the frame, which are sequential), where values over 1000 by themselves 
represent voice and below 1 000 represents noise (where the consecutive identical 
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elements is if it is greater is less than a 1000, individually). Thus, Nagasaki is consistent 
with the definition provided by the Applicant in the Specification. 

In response to the Applicant's second argument, regarding speech not being 
random when it is not known, the Examiner respectfully disagrees. The 
absence/presence of speech based on calculation of energy, with the use of a 
threshold, does make the decision random since depending on the energy calculation 
there is specific probability such signal is speech or not. The determination of future 
frames of whether speech is present using the prior values, then speech is considered 
not to be random since past information is taken into account. However, in the case of 
Nagasaki, such determination does not rely on previous speech determination but 
rather on energy values for the specific frame as described in col. 7, lines 5-15). 

In response to the Applicant's third argument, regarding the combination of 
Nagasaki and Kushner would be redundant and inoperative in the case, where both 
determine the presence of speech, the Examiner respectfully disagrees. Kushner 
describes in col. 4, lines 29-37 a speech/noise classifier for determining speech or noise 
for each frame. The incorporation of the tertiary reference of Nagasaki modifies the 
primary reference of Kushner, by classifying such frames using a different methodology 
described in col. 7, lines 5-16. In the cited section of Nagasaki, random parameters 
(energy values) are calculated for each analysis region, which indicate the presence or 
absence of speech for that specific time instance. Thus, the averaged energy value is 
the random parameter of the series of energy values calculated that allows for the 
determination of noise or speech. Thus, Kushner's frame state determination modified 
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by the use of the speech presence as taught by Nagasaki above would not be 
redundant or inoperable but rather further modify the frame state determination of 
Kushner with the reliance of the average energy being compared to a threshold for that 
frame. 

In response to the Applicant's fourth argument, regarding their not being a 
relationship between Nagasaki and the random parameter extraction unit and the frame 
state determination unit and the voice region detection unit, the Examiner respectfully 
disagrees for the reasons discussed in the prior paragraph showing the relationship 
between the primary reference of Kushner and the tertiary reference of Nagasaki. 
Further, a relationship exists for the random parameter receiving frames from the 
whitening unit. In Nagasaki col. 5, lines 5, it is described that the speech signal is 
contaminated or contains noise along with the speech signal. Hence, the use of a 
whitened signal would have been obvious in order to mimic a real-time environment, 
where the secondary reference of Durlach has been incorporated for the addition of 
noise to primary reference of Kushner. Hence, there are relationships present for each 
of the claimed elements between the primary, secondary, and tertiary references. 

In response to the Applicant's fifth argument, regarding the Kushner and Durlach 
references not able to be properly combined since Durlach disclosed directional 
information using a diversified microphone system, the Examiner respectfully disagrees 
with this assertion. The test for obviousness is not whether the features of a secondary 
reference may be bodily incorporated into the structure of the primary reference; nor is it 
that the claimed invention must be expressly suggested in any one or all of the 
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references. Rather, the test is what the combined teachings of the references would 
have suggested to those of ordinary skill in the art. See In re Keller, 642 F.2d 413, 208 
USPQ 871 (CCPA 1981). The teaching relied on for the secondary reference, namely 
Durlach is the addition of noise to a signal source (see col. 5, lines 56-65). Hence, this 
teaching modifies the primary reference (Kushner) in view of Durlach where white noise 
is added to the input frames as cited in Kushner col. 4, lines 6-7. Furthermore, in col. 3, 
lines 33-37, Kushner teaches using multiple microphones that may be present. Hence, 
the combination of Kushner in view of Durlach in view of Nagasaki would have been 
combinable using the known methods as stated above to obtain predictable results. 
Hence, since the knowledge was taken into account within the level of one of ordinary 
skilled in the art at the time of the claimed, such a reconstruction is proper, where 
motivation for combining Kushner in view Durlach is found from the secondary 
reference, col. 5, lines 61-62 and col. 1 , lines 10-14 for adding noise that normally 
occurs in speech recognition, specifically as a result of directional information, which 
simulates environmental conditions as needed. 

Hence, all rejections are maintained as per the previous Office Action. 

Claim Rejections - 35 USC § 103 

4. The following is a quotation of 35 U.S.C. 1 03(a) which forms the basis for all 
obviousness rejections set forth in this Office action: 

(a) A patent may not be obtained though the invention is not identically disclosed or described as set 
forth in section 102 of this title, if the differences between the subject matter sought to be patented and 
the prior art are such that the subject matter as a whole would have been obvious at the time the 
invention was made to a person having ordinary skill in the art to which said subject matter pertains. 
Patentability shall not be negatived by the manner in which the invention was made. 
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5. Claims 1, 2, 4, 5, 6,18, 19, 22, and 23 are rejected under 35 U.S.C. 103(a) as 
being unpatentable over Kushner et al. (US 6,321 ,1 97) in view of Durlach et al. (US 
5,828,997) in view Nagasaki (US 6,629,070). 

As to claims 1 and 18, Kushner et al. teaches a voice region detection apparatus, 
comprising: 

a preprocessing unit for dividing an input voice signal into input frames 
(see col.4, lines 6-7, segments acquisition window into frames.) comprised of a 
sequence of elements having a number of runs (see Below mapping of 
Nagasaki); 

a frame state determination unit for classifying the frames into voice 
frames and noise frames (see col. 4, lines 29-37, speech/noise classifier done by 
microprocessor 110) based on the random parameters extracted by the random 
parameter extraction unit; and 

a voice region detection unit (see col. 5, lines 9-14, microprocessor 110) 
determines the starting point and ending point of the speech utterance.) for 
detecting a voice region by calculating start (see col. 6, lines 8-9 and Figure 2, 
microprocessor 110 determines the starting point and ending point of the speech 
utterance) and end positions of a voice based (see col. 6, lines 65-66 and Figure 
2, endpoint is determined) on the voice and noise frames input from the frame 
state determination unit (e.g. From the determination of a speech utterance the 
voice regions are detected based on energy.). 
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However, Kushner et al. does not specifically teach the whitening unit for 
combining white noise to the input frames. 

Durlach et al. does teach the whitening unit combining white noise to the 
input frames (see col. 5, lines 56-65 and Figure 2, target signals (speech) 50a, 
50b, and 50n are added with the noise generator 60 by mixer 56). Although white 
noise is not used when adding to the target signals, it would have been obvious 
to add white noise to a signal or any other type of noise depending on 
environment simulated. 

It would have been obvious to one of ordinary skilled in the art at the time 
the invention was made to have modified the voice recognition as taught by 
Kushner et al. with the addition of a whitening unit as taught by Durlach et al. The 
motivation to have combined the references involve the ability to incorporate the 
directionality of a signal for sound localization (see Durlach et al., col. 5, lines 60- 
65) as would benefit the preprocessed signal from Kushner et al. for real-time 
environmental simulation. 

However, Kushner et al. in view of Durlach et al. do not specifically teach 
the random parameter extraction unit. 

Nagasaki does teach the random parameter extraction unit (see col. 6, 
lines 54-61 and col. 7, lines 1-5, thresholds used to determine whether voice is 
present or not present on the basis of energy) for extracting random parameters 
indicating the randomness of frames (see col. 6, lines 66-col. 7, lines 1-15 voice 
activity is detected based on the comparison to a threshold. The determination of 
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energy is random since it is not known whether the frame is voice or noise. 
Further, the randomness is addressed by indicating the noise or voice present in 
the signal). 

wherein the random parameter extraction unit extracts a random 
parameter for a frame input from the whitening unit (see col. 6, lines 54-60, voice 
presence determined based on intensity of energy in analysis region) based on a 
determination of the number of runs in said frame (see col. 7, lines 10-15, the 
number of runs is the energy values at each analysis region, where the frame is 
divided into (col. 4, lines 45-51)) 

It would have been obvious to one of ordinary skilled in the art at the time 
the invention was made to have modified the voice recognition as taught by 
Kushner et al. in view of Durlach et al. with the addition of random parameter 
extraction unit as taught by Nagasaki for the purpose of accurately determining 
voice frames and to prevent pulse noise to be considered as voice (see 
Nagasaki, col. 2, lines 22-36). 

As to claim 2 and 19, Kushner etal. in view of Durlach et al. in view of Nagasaki 
teach all of the limitations as in claims 1 and 18 above. 

Furthermore, Kushner etal. teaches wherein the preprocessing unit 
samples the input voice signal according to a predetermined frequency (see col. 
3, lines 66-col. 4, lines 10, digitize) and divides the sampled voice signal into a 
plurality of frames (see col. 4, lines 4-10, segmentation into frames is performed.) 
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(e.g. The digitization of the voice signal makes the use of sampling frequency 
obvious as the signal is sent to the microprocessor for further processing. It is 
obvious that this sampling frequency is utilizing the Nyquist criterion.) 



As to claim 4 and 21 , Kushner et al. in view of Durlach et al. in view of Nagasaki 
teach all of the limitations as in claims 1 and 18 above. 

Furthermore, Durlach et al. teaches wherein the whitening unit comprises 
a white noise generation unit (see Figure 2, noise generator 60) for generating 
the white noise, and a signal synthesizing unit (see Figure 2, mixer 56) for 
combining the frames input from the preprocessing unit (see signals 50a, 50b, 
and 50n) with the white noise generated by the white noise generation unit (e.g. 
Noise is added to the target signal.). 



As to claims 5 and 22, Kushner et al. in view of Durlach et al. in view of Nagasaki 
teach all of the limitations as in claim 1 and 18 above. 

Furthermore, Durlach et al. does teach the whitening unit combining white 
noise to the input frames (see col. 5, lines 56-65 and Figure 2, target signals 
(speech) 50a, 50b, and 50n are added with the noise generator 60 by mixer 56). 

Furthermore, Nagasaki does teach each of said runs consists of 
consecutive identical elements in the sequence of elements that comprise the 
frame (see col. 7, lines 10-15, energy values are identically computed for each 
subframe). 
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As to claims 6 and 23, Kushner et al. in view of Durlach et al. in view of Nagasaki 
teach all of the limitations as in claim 1 and 18 above. 

Furthermore, Nagasaki does teach wherein the random parameter is : 
NR=R/n, where NR is a random parameter of a frame, n is a half of the length of 
the frame, and is the number of runs in the frame (see col. 7, lines 11-14, where 
the number of runs, specifically the cumulative sum is totaled (of four runs) and 
divided by the total number of sub-frames to determine the voice presence (4 
sub-frames).). 

6. Claims 3 and 20 are rejected under 35 U.S.C. 103(a) as being unpatentable over 
Kushner et al. in view of Durlach et al. in view of Nagasaki as applied to claims 2 and 19 
above, and further in view of Mekuria (US 6,182,035). 

As to claims 3 and 20, Kushner et al. in view of Durlach et al. in view of 
Nagasaki teach all of the limitations as in claims 2 and 19 above. 

However, Kushner et al. in view of Durlach et al. in view of Nagasaki do 
not specifically teach the frames overlapping with one another. 

Mekuria does teach the overlapping of frames (see col. 8, lines 28-29). 
It would have been obvious to one of ordinary skilled in the art at the time 
the invention was made to have combined voice recognition as taught by 
Kushner et al. in view of Durlach et al. in view of Nagasaki with the overlapping of 
frames as taught by Mekuria. The motivation to have combined the references 
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involves the use of samples in more than one frame (see Mekuria col. 8, lines 28- 
29). 

7. Claims 7-9 and 24-26 are rejected under 35 U.S.C. 103(a) as being unpatentable 
over Kushner et al. in view of Durlach et al. in view of Nagasaki as applied to claims 1 
and 18 above, and further in view of Pastor (US 5,572,623). 

As to claims 7 and 24, Kushner et al. in view of Durlach et al. in view of Nagasaki 
teach all of the limitations as in claims 1 and 18, above. 

However, Kushner et al. in view of Durlach et al. in view of Nagasaki do 

not specifically teach the voice frames including vocal frames and fricative 

frames. 

Pastor does teach the frames including vocal and fricative frames (see col. 
4, lines 66-67 and col. 5, lines 5-14). 

It would have been obvious to one of ordinary skilled in the art at the time 
the invention was made to have combined the voice recognition as taught by 
Kushner et al. in view of Durlach et al. in view of Nagasaki with the inclusion of 
fricative frames as taught by Pastor. The motivation to have combined the 
references involves the inclusion of fricatives that are present in at the start and 
end of speech (see Pastor col. 1, lines 29-33). 

As to claims 8, 9, 25, and 26, Kushner etal. in view of Durlach et al. in view of 
Nagasaki in view of Pastor teach all of the limitations as in claims 7 and 24, above. 
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Furthermore, Nagasaki teaches wherein the frame state determination unit 
(e.g. voice activity detector) determines if the random parameter of a frame 
extracted by is below a first threshold (see col. 7, lines 2-14, voice activity is 
detected based on the comparison to a threshold.) The determination of energy 
is random since it is not known whether the frame is voice or noise.) then it is a 
vocal frame (e.g. If the noise is below the value of the threshold, then speech is 
present or vocal frame. The use of a specific threshold would have been obvious 
to one skilled in the art in order to distinguish voice from noise. Hence, the use of 
below or above a threshold is matter of design choice and relativity. The 
Applicants do not indicate reasons for selecting the stated thresholds (see 
Applicant's Specification, page 11, lines 17-21). 

8. Claims 10, 11, 27, and 28 are rejected under 35 U.S.C. 103(a) as being 
unpatentable over Kushner et al. in view of Durlach et al. in view of Nagasaki in view of 
Pastor as applied to claims 8 and 25 above, and further in view of Chong-White et al. 
(US 7,065,485). 

As to claims 10, 1 1 , 27, and 28, Kushner et al. in view of Durlach et al. in 
view of Nagasaki in view of Pastor teach all of the limitations as in claims 8 and 
25, above. 

However, Kushner et al. in view of Durlach et al. in view of Nagasaki in 
view of Pastor do not specifically teach if the random parameter of a frame 
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extracted by the random parameter extraction unit is above a second threshold, 
the relevant frame is a fricative frame. 

Chong-White et al. does teach if the random parameter of a frame 
extracted by the random parameter extraction unit (see col. 7, lines 22-25, 
energy ratio computed, similar to Nagasaki) is above a second threshold (see 
col. 7, lines 46-47, fricatives identified when above a threshold), the relevant 
frame is a fricative frame. As to claims 11 and 28, it would have been obvious to 
select a threshold value for comparing different types of values of a signal with 
respect to a ratio. 

It would have been obvious to one of ordinary skilled in the art at the time 
the invention was made to have combined the voice recognition as taught by 
Kushner et al. in view of Durlach et al. in view of Nagasaki in view of Pastor with 
the inclusion of a threshold indicating a fricative as taught by Chong-White etal. 
The motivation to have combined the references involves the ability to detect 
further unvoiced components in a signal consisting of speech and non-speech. 
Furthermore, the use of the voice recognition as taught by Kushner etal. in view 
of Durlach et al. in view of Nagasaki in view of Pastor allows the ability to detect 
noise, voice and fricatives contained in the signal. 

As to claims 12, 13, 29, and 30, Kushner et al. in view of Durlach et al. in view of 
Nagasaki in view of Pastor in view of Chong-White teach all of the limitations as in 
claims 8 and 25, above. 
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Furthermore, Nagasaki teaches wherein the frame state determination unit 
determines that if the random parameter of the frame extracted by the random 
parameter extraction unit is below the second threshold, the relevant frame is a 
noise frame (see col. 7, lines 2-14, voice activity is detected based on the 
comparison to a threshold. The determination of energy is random since it is not 
known whether the frame is voice or noise). 

However, Nagasaki does not specifically teach the use of two thresholds 
for comparison. 

It would have been obvious to use multiple thresholds for classifying each 
frame so that the detection of voice and fricative frames can be detected as 
taught by Chong-White above in order to improve detection accuracy. Further, 
the values for the thresholds used are a matter of design choice based on the 
thresholds computed. 

9. Claims 14 is rejected under 35 U.S.C. 103(a) as being unpatentable over 
Kushner et al. in view of Durlach et al. in view of Nagasaki as applied to claim 1 above, 
and further in view of Rezayee et al. ("An Adaptive KLT Approach for Speech 
Enhancement"). 

As to claim 14, Kushner et al. in view of Durlach et al. in view of Nagasaki teach 
all of the limitations as in claim 1, above. 
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However, Kushner et al. in view of Durlach et al. in view of Nagasaki do 
not specifically teach a color noise elimination unit for eliminating color noise 
from voice. 

Rezayee et al. teaches the enhancement of speech from colored noise 
(see Abstract). 

It would have been obvious to one of ordinary skilled in the art to have 
combined the voice recognition as taught by Kushner et al. in view of Durlach et 
al. in view of Nagasaki with the inclusion of a color noise eliminator as taught by 
Rezayee et al. the motivation to have combined the references is since colored 
noise consist of various noise variances and is not the same as white noise, 
which has same variance (see Rezayee et al. page 87, right column, 3 rd 
paragraph, lines 12-17). 

10. Claims 15 and 31 are rejected under 35 U.S.C. 103(a) as being unpatentable 
over Kushner et al. in view of Durlach et al. in view of Nagasaki in view of Pastor in view 
of Chong-White et al. (US 7,065,485) as applied to claims 10 and 27, above and further 
in view of Rezayee et al. ("An Adaptive KLT Approach for Speech Enhancement"). 

As to claims 15 and 31 , Kushner etal. in view of Durlach et al. in view of 
Nagasaki in view of Pastor in view of Chong-White et al. teach all of the limitations as in 
claim 1, above. 
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However, Kushner et al. in view of Durlach et al. in view of Nagasaki in 
view of Pastor in view of Chong-White et al. do not specifically teach a color 
noise elimination unit for eliminating color noise from voice. 

Rezayee et al. teaches the enhancement of speech from colored noise 
(see Abstract). 

It would have been obvious to one of ordinary skilled in the art to have 
combined the voice recognition as taught by Kushner et al. in view of Durlach et 
al. in view of Nagasaki in view of Chong-White with the inclusion of a color noise 
eliminator as taught by Rezayee et al. the motivation to have combined the 
references is since colored noise consist of various noise variances and is not 
the same as white noise, which has same variance (see Rezayee et al. page 87, 
right column, 3 rd paragraph, lines 12-17). Furthermore, it should be noted that the 
following elimination of colored noise is being done when speech is present. 
Hence, the detection of a vocal frame will entail speech is present and further 
enhance the signal from colored noise. 

Allowable Subject Matter 

1 1 . Claims 16, 17, 32, and 33 are objected to as being dependent upon a rejected 
base claim, but would be allowable if rewritten in independent form including all of the 
limitations of the base claim and any intervening claims. 

12. The following is a statement of reasons for the indication of allowable subject 
matter: None of the prior art alone or in combination teaches the following limitations: 
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"color noise ... obtained... amount of reduction in the random parameter... due to color 
noise" as recited in claims 16, 17, 32, and 33. 



Conclusion 

13. THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time 
policy as set forth in 37 CFR 1 .136(a). 

A shortened statutory period for reply to this final action is set to expire THREE 
MONTHS from the mailing date of this action. In the event a first reply is filed within 
TWO MONTHS of the mailing date of this final action and the advisory action is not 
mailed until after the end of the THREE-MONTH shortened statutory period, then the 
shortened statutory period will expire on the date the advisory action is mailed, and any 
extension fee pursuant to 37 CFR 1 .136(a) will be calculated from the mailing date of 
the advisory action. In no event, however, will the statutory period for reply expire later 
than SIX MONTHS from the mailing date of this final action. . 

Any inquiry concerning this communication or earlier communications from the 
examiner should be directed to PARAS SHAH whose telephone number is (571)270- 
1650. The examiner can normally be reached on MON.-THURS. 7:00a. m.-4:00p.m. 
EST. 

If attempts to reach the examiner by telephone are unsuccessful, the examiner's 
supervisor, David Hudspeth can be reached on (571)272-7843. The fax phone number 
for the organization where this application or proceeding is assigned is 571-273-8300. 
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Information regarding the status of an application may be obtained from the 
Patent Application Information Retrieval (PAIR) system. Status information for 
published applications may be obtained from either Private PAIR or Public PAIR. 
Status information for unpublished applications is available through Private PAIR only. 
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